SUBTLEX - AL : Albanian word frequencies based on film subtitles
نویسنده
چکیده
Iliria International Review – 2013/1 © Felix–Verlag, Holzkirchen, Germany and Iliria College, Pristina, Kosovo Abstract Recently several studies have shown that word frequency estimation based on subtitle files explains better the variance in word recognition performance than traditional words frequency estimates did. The present study aims to show this frequency estimate in Albanian from more than 2M words coming from film subtitles. Our results show high correlation between the RT from a LD study (120 stimuli) and the SUBTLEXAL, as well as, high correlation between this and the unique existing frequency list of a hundred more frequent Albanian words. These findings suggest that SUBTLEX-AL it is good frequency estimation, furthermore, this is the first database of frequency estimation in Albanian larger than 100 words.
منابع مشابه
On the advantages of word frequency and contextual diversity measures extracted from subtitles: The case of Portuguese.
We examined the potential advantage of the lexical databases using subtitles and present SUBTLEX-PT, a new lexical database for 132,710 Portuguese words obtained from a 78 million corpus based on film and television series subtitles, offering word frequency and contextual diversity measures. Additionally we validated SUBTLEX-PT with a lexical decision study involving 1920 Portuguese words (and ...
متن کاملSUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles
BACKGROUND Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used to. METHODOLOGY Following recent work by New, Brysbaert, and colleagues in English, French and Du...
متن کاملSubtitle-Based Word Frequencies as the Best Estimate of Reading Behavior: The Case of Greek
Previous evidence has shown that word frequencies calculated from corpora based on film and television subtitles can readily account for reading performance, since the language used in subtitles greatly approximates everyday language. The present study examines this issue in a society with increased exposure to subtitle reading. We compiled SUBTLEX-GR, a subtitled-based corpus consisting of mor...
متن کاملAssessing the Usefulness of Google Books’ Word Frequencies for Psycholinguistic Research on Word Processing
In this Perspective Article we assess the usefulness of Google's new word frequencies for word recognition research (lexical decision and word naming). We find that, despite the massive corpus on which the Google estimates are based (131 billion words from books published in the United States alone), the Google American English frequencies explain 11% less of the variance in the lexical decisio...
متن کاملSUBTLEX-UK: a new and improved word frequency database for British English.
We present word frequencies based on subtitles of British television programmes. We show that the SUBTLEX-UK word frequencies explain more of the variance in the lexical decision times of the British Lexicon Project than the word frequencies based on the British National Corpus and the SUBTLEX-US frequencies. In addition to the word form frequencies, we also present measures of contextual diver...
متن کامل